Search CORE

40 research outputs found

Self-Supervised and Controlled Multi-Document Opinion Summarization

Author: Coavoux Maximin
Elsahar Hady
Gallé Matthias
Rozen Jos
Publication venue
Publication date: 30/04/2020
Field of study

We address the problem of unsupervised abstractive summarization of collections of user generated reviews with self-supervision and control. We propose a self-supervised setup that considers an individual document as a target summary for a set of similar documents. This setting makes training simpler than previous approaches by relying only on standard log-likelihood loss. We address the problem of hallucinations through the use of control codes, to steer the generation towards more coherent and relevant summaries.Finally, we extend the Transformer architecture to allow for multiple reviews as input. Our benchmarks on two datasets against graph-based and recent neural abstractive unsupervised models show that our proposed method generates summaries with a superior quality and relevance.This is confirmed in our human evaluation which focuses explicitly on the faithfulness of generated summaries We also provide an ablation study, which shows the importance of the control setup in controlling hallucinations and achieve high sentiment and topic alignment of the summaries with the input reviews.Comment: 18 pages including 5 pages appendi

arXiv.org e-Print Archive

Hal - Université Grenoble Alpes

Algoritmos para la búsqueda eficiente de instancias similares

Author: Gallé Matthias
Publication venue
Publication date: 01/01/2007
Field of study

Tesis (Lic. en Ciencias de la Computación)--Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física, 2007.En el presente trabajo encaramos el desafío de buscar objetos similares dentro de una colección muy grande de estos objetos. Encontramos dos dificultades en éste problema: en primer lugar definir una medida de similitud entre dos objetos y luego implementar un algoritmo que, basandose en esa medida, encuentre de una manera eficiente los objetos suficientemente parecidos. La solución presentada utiliza una medida basada fuertemente en los conceptos de precisión y recall, obteniendose una medida similar a la de Jaccard. La eficiencia del algoritmo radica en la generación de grupos de objetos similares, y solamente después busca éstos objetos en la base de datos. Usamos éste algoritmo en dos aplicaciones: por un lado a una base de datos de usuarios que evalúan películas a fin de proyectar éstas notas. Por otro lado, la utilizamos para encontrar pérfiles genéticos que pueden haber aportado a una evidencia genética.Matthias Gallé

Repositorio Digital de la Universidad Nacional de Córdoba

Searching for Smallest Grammars on Large Sequences and Application to DNA

Author: Carrascosa Rafael
Coste François
Gallé Matthias
Infante-Lopez Gabriel
Publication venue: 'Elsevier BV'
Publication date: 01/02/2012
Field of study

International audienceMotivated by the inference of the structure of genomic sequences, we address here the smallest grammar problem. In previous work, we introduced a new perspective on this problem, splitting the task into two different optimization problems: choosing which words will be considered constituents of the final grammar and finding a minimal parsing with these constituents. Here we focus on making these ideas applicable on large sequences. First, we improve the complexity of existing algorithms by using the concept of maximal repeats when choosing which substrings will be the constituents of the grammar. Then, we improve the size of the grammars by cautiously adding a minimal parsing optimization step. Together, these approaches enable us to propose new practical algorithms that return smaller grammars (up to 10\%) in approximately the same amount of time than their competitors on a classical set of genomic sequences and on whole genomes of model organisms

HAL-CentraleSupelec

Elsevier - Publisher Connector

INRIA a CCSD electronic archive server

HAL-Ecole des Ponts ParisTech

HAL-Rennes 1

HAL - UPEC / UPEM

In-place Update of Suffix Array while Recoding Words

Author: Coste François
Gallé Matthias
Peterlongo Pierre
Publication venue: HAL CCSD
Publication date: 01/09/2008
Field of study

International audienceMotivated by grammatical inference and data compression applications, we propose an algorithm to update a suffix array after the substitution, in the indexed text, of some occurrences of a given word by a new character. Compared to other published index update methods, the problem addressed here may require the modification of a large number of distinct positions over the original text. The proposed algorithm uses the specific internal order of suffix arrays in order to update simultaneously groups of entries, and ensures that only entries to be modified are visited. Experiments confirm a significant execution time speed-up compared to the construction of suffix array from scratch at each step of the application

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

BigScience: A Case Study in the Social Construction of a Multilingual Large Language Model

Author: Akiki Christopher
Gallé Matthias
Ilić Suzana
Jernite Yacine
Mieskes Margot
Pistilli Giada
Wolf Thomas
Publication venue
Publication date: 09/12/2022
Field of study

The BigScience Workshop was a value-driven initiative that spanned one and half years of interdisciplinary research and culminated in the creation of ROOTS, a 1.6TB multilingual dataset that was used to train BLOOM, one of the largest multilingual language models to date. In addition to the technical outcomes and artifacts, the workshop fostered multidisciplinary collaborations around large models, datasets, and their analysis. This in turn led to a wide range of research publications spanning topics from ethics to law, data governance, modeling choices and distributed training. This paper focuses on the collaborative research aspects of BigScience and takes a step back to look at the challenges of large-scale participatory research, with respect to participant diversity and the tasks required to successfully carry out such a project. Our main goal is to share the lessons we learned from this experience, what we could have done better and what we did well. We show how the impact of such a social approach to scientific research goes well beyond the technical artifacts that were the basis of its inception.Comment: Presented at the 2022 NeurIPS Workshop on Broadening Research Collaborations in M

arXiv.org e-Print Archive